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A single McCulloch-Pitts neuron, that is, the simple perceptron is studied, with focus on the 
region beyond storage capacity. It is shown that Parisi's hierarchical ansatz for the overlap matrix 
of the synaptic couplings with so called continuous replica symmetry breaking is a solution, and as 
we propose it is the exact one, to the equilibrium problem. We describe some of the most salient 
features of the theory and give results about the low temperature region. In particular, the basics 
of the Parisi technique and the way to calculate thermodynamical expectation values is explained. 
We have numerically extremized the replica free energy functional for some parameter settings, and 
thus obtained the order parameter function, i. e., the probability distribution of overlaps. That 
enabled us to evaluate the probability density of the local stability parameter. We also performed a 
simulation and found a local stability density closer to the theoretical curve than previous numerical 
results were. 
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I. INTRODUCTION 

In his seminal paper Hopfield jlj reformulated Little's model || of auto-associative memory in terms of an energy 
function. By this act, the field of the statistical mechanics of neural networks was plowed and sown in, and proved 
itself since then remarkably fertile. The network is an interconnected set of McCulloch-Pitts neurons the latter 
being perhaps the biologically least realistic model of a nerve cell. In its simplest version the model neuron can be in 
one of only two states, "firing" or "quiescent", it is nonlinear, and admits a large number of connections from other 
^ , units. Despite the oversimplification in its node cells, the network can exhibit complex behavior and functions as a 
reasonable model of content addressable memory. From the practical viewpoint, however, artificial neural networks 
are generically not superior to other methods in the task of storage and associative retrieval • 

An important ingredient of this type of models of neural networks is what is called in statistical mechanics quenched 
disorder. For example, in the Little-Hopfield model the synaptic coupling strengths, made up of random numbers 
by the Hebb rule |,|, can be held fixed for the duration of the neural dynamical process. This is the basis for 
the analogy between spin glass models, the archetypical example of a many-body problem with quenched disorder, 
and neural networks Methods borrowed from the theory of spin glasses, in particular from the infinite range 

interaction Sherrington-Kirkpatrick model, yielded a harvest of results on a variety of neural model systems, as well 
as on other problems whose motivation came from outside of physics but could be formulated as disordered statistical 
mechanical systems 

It is safe to say that the technique inherited from spin glass theory and the most widely used in the statistical 
mechanical approach to neural networks in equilibrium is the replica method Q. It was first applied thoroughly for 
infinite range interaction spin glasses, thus it is especially suited for networks where neurons have a large number of 
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connections. In its simplest version, with so called replica symmetry, it can be straightforwardly adopted to many a 
neural problem S. 

Another family of artificial neural networks consists of layered feed-forward networks || , which accept a number 
of inputs and for a given set of couplings produce the output as a function of the inputs. Such networks were 
introduced for the purpose of generalization, i. e., rule extraction, from examples of input-output pairs. Then the 
a priori unknown target rule is to be reconstructed by adaptive changes in the couplings, called training. Since the 
introduction of the backpropagation algorithm || for that purpose, feed- forward networks found wide usage ||,|l0|,|llj . 

The statistical physics of neural modeling gained an impetus of lasting effect from the work by Gardner and Derrida 
P,p"2| on the storage problem of a single neuron, called also simple perceptron in that context. This is the simplest 
version jl3| of feedforward networks. While in the Little- Hopfield statistical mechanical system the quenched variables 
are the couplings and a microstate is a configuration of neural states, the roles in feed-forward networks are reversed. 
Now the space of synaptic couplings is considered as the configuration space within Boltzmannian thermodynamics 
and the examples appear as quenched parameters. The error for one example measures the difference between the 
actual output of the network and the required output. The errors on all training examples add up to form the 
Hamiltonian, i. e., the cost function, of the statistical mechanical model. Within the canonical statistical mechanical 
approach the temperature plays its usual roles. On the one hand, it is the Lagrange multiplier associated with a 
preset value of the error, on the other hand, it is the amplitude of noise if a gradient-descent-like dynamics of the 
couplings is used so as to reach optimal configuration. The thermodynamical limit is achieved by admitting a large 
number of adjustable couplings, but for that it is not necessary to have many neurons. The approach of Gardner 
and Derrida was successful in what is called equilibrium learning, when a Hamiltonian can be associated with the 
problem. However, statistical physical methods are proving themselves useful also in studying off-equilibrium learning 
algorithms 

A central quantity of a feed-forward network is its storage capacity, i. e., how many random input-output examples 
the network can reproduce without error. In terms of the statistical mechanical approach this is in its original 
formulation a zero temperature problem. The subspace of couplings that reproduce a given set of patterns is called 
version space, its volume, related to the ground state (T = 0) entropy, vanishes beyond capacity Since the statistical 
mechanical solution of the region below capacity of a single neuron by Gardner and Derrida [0U3]_a number of results 
have been obtained about storage properties of feed-forward networks, see for example Refs. Ill| , [l5| -[lgfl. Nevertheless, 
if the task is to minimize the number of incorrectly stored examples, beyond capacity the problem has not been solved. 
Technically this is because below capacity for completely random examples replica symmetry holds, while beyond it 
no finite replica symmetry breaking scheme yields thermodynamically stable solution |lj|. 

In the present paper we reconsider the problem of storage of random patterns, technically generalize Parisi's solution 
of the Sherrington-Kirkpatrick model, and obtain beyond capacity a phase reminiscent to the frustrated ground state 
of the Sherrington-Kirkpatrick model. That phase continues for T > into the analog of the low temperature, or 
Parisi, phase of the spin glass. 

Our work is motivated not only by the problem of storing random patterns. Generalization has also been successfully 
analyzed by statistical mechanical methods, see Ref. 0,^] for reviews. The storage problem below capacity is 
analogous to equilibrium learning of a learnable task, where the network is compatible with all possible examples, 
there is no frustration in either systems. For instance, equilibrium generalization properties of the perceptron when the 
examples are generated by another one, can be understood for arbitrary number of examples within replica symmetry 
plf . However, a given feed-forward network may be unable to reproduce a complicated target function on all possible 
examples |l(]] . If the target is unlearnable then the network is presumed to get into a frustrated phase if a sufficiently 
large number of examples are used. Thus the properties of a network beyond capacity are of foremost interest from 
the viewpoint of rule extraction as well. 

The single neuron may be considered as the hydrogen atom of neural problems and studied for its own interest. It 
is the unit of the Little-Hopfield network, where the symmetry in the couplings has been given up and the couplings 
of different neurons are considered as independent. As to a feed- forward network, even if the whole network operates 
without error, its units may still be strained beyond their individual capacities fig] . Thus the description of a single 
neuron beyond its storage capacity is of importance also from the viewpoint of networked neurons. Furthermore, 
a close analogy exists between the behavior of model neurons beyond capacity and the glassy, frustrated, phase of 
disordered spin systems (|,0,^,^3|,|l9|,^4| . Therefore, the understanding of the way a single neuron works may have 
ramifications beyond the field of artificial neural networks. 

We firstly review in Sec. [ij the statistical mechanics of storage, recall the basic thermodynamical quantities and 
the formula for the replica free energy of a single neuron. In Sec. Ill we summarize the main ingredients of Parisi's 
approach and obtain the free energy functional that we propose describes the equilibrium problem. This way some 
background is given to our previous communication p^ ] , wherein we identified a spin glass phase of Parisi type in the 
high temperature limit. The recipe for the calculation of expectation values by means of Green functions is explained 
in Sec. [V, producing among other the formulas for the local stability distribution and the stationarity conditions. Sec. 
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^ is devoted to the scaling for low temperatures, enabling us to put the extremization of the free energy functional 
on a computer, and in the end a simulation is discussed. 

II. THE STORAGE PROBLEM AND ITS REPLICA FREE ENERGY 

The model neuron, or perceptron, under consideration is |3|,|8| 

£ = sign(/i), (2.1a) 
h = N- 1 / 2 Y* J k S k , (2.1b) 

' * AC— 1 

where J is the synaptic coupling vector, S the input and £ the output. Patterns are given as input-output data, 

{S",^}f =1 . (2.2) 

In the simplest setup the S^-s are independently drawn from any distribution with unit variance and zero average, 
and £ M = ±1 with equal probability. We introduce the local stability parameter as 

A M = (2.3) 

where is given by ( p.lb| ) with S 1 ^. If the neuron generates ^ in response to S^, i. e., A M > 0, then we say that 
the n-th. pattern is stored. If A M is a large positive number then high stability of storage against changes in either 
the couplings or the inputs can be assumed. Large stability is associated with large basin of attraction in memory 
networks ||. Given the ensemble of patterns, the local stability parameter obeys some distribution p(A) p^j . If the 
number of patterns M is of order N then it is useful to introduce the relative number of examples 

a = M/N, (2.4) 

called also load parameter. Since an overall positive factor of J does not change the output, we set the norm of J to 
y/N, expressed by the prior distribution 

w(J) =C N §(n- |J| 2 ) . (2.5) 

This is called spherical constraint. The factor Cjv normalizes w(J) to unity, it has no thermodynamical significance 
besides setting the zero point of the entropy scale. Given the distribution of patterns and the length of J it can be 



easily seen that the normalization in (2.1b) results in h values of typically 0(1). 

Storage with minimal error can be formulated as an optimization task by our introducing an error measure. If we 
treat all patterns in the same way we obtain what is called the equilibrium problem. The associated Hamiltonian, or 
cost function, is 



M 



H = ^2V(A fi ), (2.6) 

where the potential V(A fl ) gives the error on a single pattern S M ,^'. Obviously, V(y) — for y > in the original 
storage problem. One can also impose a bound k for the local stability, i. e., V(y) is set to be zero for y > k, if n > 
this is obviously stricter than the original storage criterion. Generically, V(y) should be monotonically decreasing for 
y < k. In this paper specific results will be presented on the error counting, or Gardner-Derrida p|Jl^|, potential 

V(y) = 6(K-y), (2.7) 

where 9(y) is the Heaviside function. Given the potential (|]^) the aim is to minimize the number of patterns whose 
stability is below the bound k. In the theoretical framework we shall keep the general form V(y). 
The partition function of the optimization task is 

M 



Z = J d N J w(3) cxp ( -p J2 V(A») j , (2. 



3 



with j3 — 1/T. Quenched average, (. . .) qu , is defined as the mean over the patterns. We deal with the idealized 
equilibrium of the system, when for large N the free energy, the energy, and the entropy are assumed to approach 
their quenched average. This property of self-averaging was proved rigorously only in special cases, see for example 
p6| , but it is the widely used basis in studies of the equilibrium thermodynamics of disordered systems . 
The replica method [pP] consists in writing the mean free energy per coupling as 



f = — lim 



(ln^) qu 

Nf3 



lim lim • 



(Z n ) 



Iqu 



nNf3 



(2.9) 



Denoting the thermal average with the Boltzmann weight in (2.8) by 
as 



((^(A)) th ; 



qu 



1W 

a d/3 ' 



The entropy is 



= 0(ae - /). 



)th the mean error per pattern can be written 

(2.10) 

(2.11) 



For T = and e = the volume of version space, i. e., the space of couplings that perfectly reproduce the examples 
is obtained as Q = exp(A^s). In general, Ns has the usual meaning of the logarithm of the volume with given error e. 
Introducing the overlap matrix Q of synaptic vectors 



1 N 

[QLb = lab = JT ^ J akJbk 



(2.12) 



k=l 



where the replica indices o, b go from 1 to n, we can express the free energy as the result of the minimization of the 
replica free energy p5|,B2|,p 



/ 

/(Q) 
f s (Q) 

fe(Q) 



lim — min f(Q) 

n^o n Q J 

/ s (Q)+a/ e (Q), 
-(2/3)- 1 lndetQ, 



(3 



In 



d n x d n y 
(2tt) i: 



■ exp 



* — J a= 1 



ixy - tjxQx 



(2.13a) 

(2.13b) 
(2.13c) 

(2.13d) 



The subscripts s and e stand for entropic a nd e nergy-like terms. The entropic contribution (2.13c ) arise s because of 
the spherical constraint and the definition ( 2.12 ), it is indeed independent of the potential, while ( 2.13d ) depends on 
it. 

A central role is played by the probability density for the local stabilities |25|.|22|j 



p(A) = <(<S(A-/ l 1 e 1 )) th ) qu , 



(2.14) 



where (2.1b) is understood. Due to the symmetry of the Hamiltonian with respect to the perm utation of patterns we 
could choose fj, = 1 for convenience. It will turn out to be useful to interpret the integrand in (2.13d) as an effective 
Boltzmann weight and denote the average over this measure as 



where the n — > limit is implied. A straightforward replica calculation shows (see Ref. 
tion) that the local stability distribution can be rewritten as 

p(A) = ««5(A- yi )». 



(2.15) 

for a pedagogic presenta- 
(2.16) 



Here the subscript could be any replica index, for convenience we chose 1. Comparison with (2.14) allows an intuitive 
interpretation for the replica average ( (. . .)), namely, this corresponds to the combined average over t herma l and 
quenched fluctuations. From (2.14,2.16) it follows that the combined thermal and quenched average in (2.1C) boils 
down to 
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e=((V( yi ))) = / dyp(y)V(y). (2.17) 



The free energy ( 2.13] ) was calculated within the replica symmetric ansatz for the error counting potential (2.7) 



and the capacity, i. e., the maximal a with e = at T = was determined in ||,|l^|. It has been shown that beyond 
capacity the replica symmetric solution is thermodynamically unstable pq ]. One (2^^j| and two |L9| step replica 
symmetry breaking solutions were presented, while Ref. fll9j proved that no finite step symmetry breaking ansatz can 
possibly be thermodynamically stable. We presented in |24| a variational free energy functional without derivation 
that incorporated continuous replica symmetry breaking, but gave concrete results only in the high temperature, large 
a limit. In what follows we provide some background to the general theory and in the end properties of the ground 
state (T = 0) beyond capacity will also be described. 



III. THE PARISI SCHEME 



In this section we show how to evaluate the replica free energy / e (Q) of ( 2.13d| ) within Parisi's ansatz. The R step 
replica symmetry breaking form is |B9[ 



B+l 



Q = ^ (gr - Qr-l) U mr <g) \ n / mr , 



(3.1) 



r=0 



where k subscripts fc-dimensional matrices, U is the identity operator, all elements of Ufe equal 1, ® marks the direct 
product, and 



Q-i = < q < <?i . . . < q R < q R+ i = 1, 
m R+i = 1 < fnR < mR-i • ■ ■ < fn>\ < mo = n, 



(3.2a) 
(3.2b) 



where the integer m r is a divisor of m r -\. The n — > limit can be performed smoothly if instead of m r we use 
x r = (n— m r )/(n — 1) for parametrization. Thus for arbitrary n > we have the ordering 



Xr+i = 1 > x R > x R -i . . . > xi > x Q = 0. 



(3.3) 



We consider the a; r -s fixed along the n — > limiting process, whence follows the formal n-dependence of the m r (n)-s, 
and for n = 0we get x r — m r (0). The inspection o f the fi rst few R = 0, 1, 2 cases P, p2| , p3 19 allows, in the spirit of 
Parisi's p9|, the generalization of the energy term (2.13d) in the replica free energy to arbitrary R as 



/ e (q,x) - lim -/ e (Q) 

rwO n 



1 

fixi 



Dzq In / Dzi 



Dz 2 . 



Dz R+1 exp • 



/R+i ^ 

-pv ( ^2 z r\/qr - q r -\ 

\r=Q / 



(3.4) 



where 



Dz = 



dz e 2 s 



/2vr 



(3.5) 



This is the analog of Parisi's formula for the Sherrington-Kirkpatrick model, Eq. (11) in p9[ ; a comprehensive 
derivatio n will be presented elsew here p7| . The energy term (2.13d) has become a function of the parameters in 
(|3.2ajj3.3D . The evaluation of @ can be done by iteration, 



i>r-i{y) = Dz ip r (y + Zy/q T - q r -i) * r+1 



^ R+1 {y) = e-^y\ 



(3.6a) 
(3.6b) 



where = 1 is understood. Then the sought free energy term is obtained as 
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/ e (q,x) = -— I Dz lnVo^VSo)- 
pxi J 



(3.7) 



where q = (q , ...,q R ) and x = (xi, . . . , x R ). 

The above iteration can be redressed as a partial differential equation. Parisi's order parameter function x(q) is a 
concatenation of the q and x as 



r=0 



where X-\ — 0. Next we introduce ipiQiV) such that at the discontinuities 



i>(q-°,y)=iP(q+ > y) 7 ^, 
that is, i/)(g, y) 1 / 2 ^ 9 ' is continuous in q. Furthermore, along the plateau in the open interval (q r -i,q r ) 



ip{q, v) 



Dz tp (q r °,y + zy/q r - q) 



It is easy to show that the field so defined satisfies the partial differential equation (PDE) 

x(q) 



9 q ip{q,y) = -hd$"ip(q,y) + ^fl^{q,y) hiip(q,y), 



^(i,y) 



- P -PV(y) 



(3.8) 

(3.9a) 
(3.9b) 

(3.10) 

(3.11a) 
(3.11b) 



which evolves from q = 1 to q = 0. I ndeed, along the plateaus x(q) — when only the first term on the r. h. s. of 
( B.llaj ) remains, thus producing (3.1C). Near jumps of x(q) the second term dominates, and at a fixed y the resulting 
ordin ary differential equation in the variable q is separable. Hence it follows that ^(g, y) 1 ^^ is continuous in q, thus 
( 3.9b ) is recovered. An equivalent field can be defined by 



f(q,y) 

satisfying for a continuous potential V(y) the PDE 



px{q) 



(3.12) 



d g f(q,y) = -\d 2 y f{q,y) + \(3x{q) (d y f(q,y)) 2 
f(l,y) = V(y). 



(3.13a) 
(3.13b) 

The fact t hat / denotes the free energy ( 2.13a[) as well as the field f(q,y) should not cause misunderstandings. 
Equation ( |3.13a| ) with initial condition In 2 cosh j3y has been discovered by Parisi |29| while studying the Sherrington- 
Kirkpatrick model. 

If the potential V(y) is not continuous, the PDE (3.13) holds only from any q* < 1 onward where lnip(q*,y) is 
continuous in y. In the generic case such is qn, so the evolution along the first plateau from 1 to qu, where x(q) = 1, 
is to be done explicitly as 



f(qn,y) 



f3x R 



p-Hn { Dz e-^G/WI^), 



+ i 



(3.14) 



and for q < q R the PDE ( |3.13a ) with the initial condition (3.14) can be used. Alternatively, one can introduce the 
field m{q, y) as 



ip{q 7 y)m(q,y) 



dy^jq^y) 
(3x{q) ' 



(3.15) 



which equals d y f(q,y) when continuity in y holds. Then the PDE (3.13a) should be replaced by 



d g f{q,y) = - ^d y m(q,y) + \(3xm 2 {q,y), 



(3.16) 



G 



while the initial condition ( [3. 13b ) can be kept. 

Finally we obtain the sought free energy term ( 2.13d ) as a functional of the order parameter function 



/e[aj(«)] = lim-/ e (Q)=/(0,0). 

n— >0 n 



(3.17) 



It should be emphasized that the above PDE-s do not require infinite refining of the partition by q r -s of the interval 
(0,1). They are valid for discrete as well as continuous replica symmetry breaking schemes, i. e., they admit x{q) 
with steps and platea us, as well as strictly monotonically increasing continuo us seg ments. 

The entropic term ( 2.13c ) can also be cast in the form of the energy term ( [2.13d ) with the substitution 



-PV(y) 



(3.18) 



If we conceive the Dirac delta as a Gaussian with small variance, the initial con dition (3.13b) becomes a quadratic 
function and the PDE ( 3.13a ) can be solved analytically. The analogue of ( 3.17 ) for the entropic term, after going 
with the variance to zero, can be cast into 



/.[*(?)] = lim i -f s (Q) = -±- [ dq 
n—>0 n 2p „/ 



1 



1 



D{q) 



(3.19) 



where 



D(q) 



dq x(q). 



(3.20) 



The free energy of the neuron is then obtained as 



/ = max / [x(q)] , 

x(q) 



(3.21) 



where the free energy functional is 



f[x(q)] = f a [x(q)] + af e [x(q)], 



(3.22) 



with f s and f e defined in Eqs. ( 3.19 , 3.17 ). It is due to the n — ► limit that the maximization in ( 3.21 ) replaces the 
minimization by the matrix elements of Q in (2.13a), see, e. <?., Ref. 0. 



IV. LINEAR RESPONSE THEORY, STATION ARITY CONDITIONS, AND EXPECTATION VALUES 



The least obvious part of the extremizatio n cond ition ( 3.21 ) is the variation of /(0, 0) by x{q). This can be calculated 
from linear response theory for the PDE ( p. 13a ). Moreover, linear response theory yields a technique to calculate 
replica averages as introduced in (2.15 ), es sential for the evaluation of physical quantities. 

The Green function for the PDE (3.13a) can be introduced formally as 



Sf(q2,V2)' 



(4.1) 



whence G(qi, yv, 92, IJ2) = for ai > 17 2 ■ The Green function for the Parisi solution of the Shcrrington-Kirkpatrick 
model has been studied in Refs. |B0j , Bl[ . In the fore and hind variable pairs the Green function G{qi, yi', 92, 2/2) satisfies 
the respective PDE-s 



d qi G = -\d\ y G + f3x{q 1 )m(q 1 ,y 1 )d yi G - 8(qi - q 2 )5{y\ - y 2 ), 



d„ 2 g 



h d m^ + P x {l2)dy 2 [m(q 2 , y 2 )G) + S(qi - q 2 )S(yi - y 2 ), 



(4.2a) 
(4.2b) 



where m is given in (3.15). The first equation without the Dirac delta excitation is the linearization of the PDE 
(3.13a). The minus sign o f the Dirac deltas follows from the fact that (4.2a) evolves towards decreasing "time" q. 



The homogeneous part of (4.2b) is obtained from the requirement that 
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5(91,2/15 93,2/3)= / dy 2 g(qi,yi;q2,y2)G(q2,y2 m ,q3,y3) 



(4.3) 

does not depend on q 2l and the plus sign of the inhomogeneous term is due to the fact that evolution goes towards 
increasing q. The homogeneous parts of two PD E-s ( 4.2b ) and ( 4.2b| ) are called adjoint to each other. 

For the sake of simplicity we g ave formula (4.1) for the case of continuous potential V(y). If the potential is 
discontinuo us t hen the definition (4.1 ) should and can be appropriately modified when a g-argument is near 1, but 
the PDE-s ( [4.2D for the Green functions hold as they are. 

The significance of the Green function is in that it helps to solve the linear PDE with the source term h(q, y) 



d q tf(q, y) = -^d*$(q, y) + (3x(q)m(q, y)d v &(q, y) + h{q, y), 



as 



&{q,y) = J dyiG v (q, y; 1, J/i)«?(l,2/i) 
A prominent role will be played by 



dqi / dyig ip (q,y;q 1 ,yi)h(q 1 ,y 1 ). 



which solves the PDE ( 4.2b ) with qi = yi 



P(q,y)=G(0,0;q,y), 
0, i. e., with initial condition 
P(0,y)=S(y). 



(4.4) 



(4.5) 



(4.6) 



(4.7) 



This function first appeared in the context of the Sherrington-Kirkpatrick model in Ref. (3^] . Note that the PDE for 
P(q, y) is in fact a Fokker-Planck equation, producing a nonnegative solution and conserving the norm J dy P(q, y) = 1 
for all q-s. This suggests the intuitive interpretation of P(q, y) as probability density o f y. 



Now we are in the position to calculate the variation of th e free energy functio nal (3.22). As to the energy term 
( 3.171) , by varying the functions f(q,y) and x{q) in the PDE ( 3.13a ) one obtains (4.4) with ■d = Sf, J?(l,y) = 0, and 
h = ±f3(d y f) 2 5x. Hence ([O) gives at q = 0, y = the sought Sf(0, 0)/Sx. The variation of f s [x(q)] can be calculated 
straightforwardly, and, with the notation (4.6), we arrive at 



F(q, [x(q)}) 



2 Sf[x(q)] 



dq 



a / dyP(q,y)m(q,yf 



(4.8) 



P Sx(q) \J 2 D(qf 
The stationarity condition in case x(q) can be freely varied is thus 

F(q, [x(q)}) = 0. (4.9) 
If in an interval I the x(q) is supposed to have a plateau, we differentiate by the plateau value of x{q) to get 



dxF(q, [x(q)})=0. 



(4.10) 



In isolated points q r where plateaus meet ( |4.9| ) should hold pointwise. This summarizes the stationarity conditions 
for an arbitrary order parameter function, i. e., arbitrary replica symmetry broken scheme of Parisi type. 

We ha ve se en in Sec. [ll| instances when the double, thermal and quenched, average could be replaced by the replica 
average (|J5|). Replica averages can be calculated by the Green function technique as described below. The procedure 
can be viewed as the generalization of the groundbreaking results from Refs. |3^,Q, where the local magnetization 
and some of its moments in the Parisi phase of the Sherrington-Kirkpatrick model were evaluated. 

A simple case is when ((A(y a ))) is to be calculated for an arbitrary function A(y). Because of the symmetry with 
respect to the permutation of sin gle rep lica indices we have ((A(y a ))) = ((n^ 1 J2a=i ^(fa)))- This quantity can be 
easily evaluated if one replaces in ( 2.13d ) V(y) by V(y) + XA(y), thus obtains / e (Q; A ), and cal culat es its initial slope 
at A = 0. Reversing the limits n — > and A — > then using the first equality of (3.4) and Eq. (3.17) we get 



II ai \\\ v 1 <9/e(Q;A) 
{{My a))) = hm — 

n->0 n OA 



9/(0,0; A) 



A=0 



A=0 



d y y^A(y)= ! ,/,,/>(!. ,/i.U,/). 



(4.11) 
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The third equality comes from the fact that A is in the initial condition for /(g, y) at q = 1, and the last one comes 
from the definitions for the Green function (4T) and for P(q,y) ((0]). Again, for the sake of brevity we gave the 
deriva tion for continuous potential V(y), however, the result holds also for discontinuous ones. Immediately follows 
from ( |2.16| ) the formula for the probability density of the local stabilities 



and thus from ( [2.17 ) 



p(y) = P(l,y), 



dyP(l,y)V(y). 



From the practical viewpoint interesting are effective averages of products like 

({A 1 {y ai )A 2 (y a2 )...A k (y ak ))). 



(4.12) 



(4.13) 



(4.14) 



One can show by using elementary properties of t he Fou rier transformation that the average of a product of x a -s over 
the effective Boltzmann weight on the r. h. s. of (2.13d) can be expressed as averages over functions of variables y a s. 
Thus the knowledge how to evaluate ( 4.14 ) also resolves the problem of averages of polynomials in x a -s. The latter 
quantities are of importance because they appear when t he rep lica free energy is differentiated in terms of q a b-s. 

Here we shall only describe the recipe for calculating ( 4.14 ), details can be found in Ref. ^7j. If k = 2 then the 
average depends on q — q ai a 2 an d is given by the formula 



C 12 (q) = ((A 1 (y ai )A 2 (y a2 ))) 

dy dy 2 dy 3 P{q, y) Q{q, y; 1, yi) Q{q, y; 1, y 2 ) Ax{yi) A 2 (y 2 ). 



(4.15) 



Of such type is df e (Q)/dq aia2 where A 1 M = A 2 (y) 
second term in F(q, [x(q)]) given in Eq. (4 



(l,y), see ( 3.15| ) for definition, whence we obtain the 



This is related to the fact that the stationarity condition can also be 
obtained by first differentiating the replica free energy ( [2.13b| ) by q a b and then equating the result to zero. For k = 3 
suppose without restricting generality that 

q = q ai a 3 = Qa 2 a 3 < q = q ai a 2 - (4.16) 

Then we have 

Cl2 3 (q,q) = {(MVai) MVaz) MVas))) 

>lii (hj dm. ihj2 <iii.: P(q, y) G{q, y; 1, 2/3) S(q, y; q, y) G(q, y; 1, yi) G{q, y; 1, y 2 ) 

xA^A^y^Asiys). (4.17) 

Special versions of the above formulas, for the case of the second and third moments of the magnetization in the 
Sherrington-Kirkpatrick model, were worked out in Refs. [ p0|]3"3| ]. The integrals in (4.17) admit a simple graphic 
representation as shown on Fig. 1. 








FIG. 1. Correlation function Ci23(<?, g). The right endpoints correspond to the functions to be averaged and lines are Green 
functions (only one of them is labeled by Q in the figure). All nodes, including the right endpoints, have y-s which should 
be integrated over. The leftmost line is the function P(q,y), or equivalently, this line is also a Green function and a 8(y) is 
associated with the leftmost endpoint. 



The cases considered above are the effective average for k = 1, given by Eq. (4.11), represented by one line, and for 
k = 2, as calculated in (4. IE), represented by a fork with one handle and two branches. Averages of more than three 
functions can be analogously constructed, for a given k a graph has k + 1 "legs" . Obviously there are two topologically 
possible graphs for k = 4, depending on the overlaps q ai a v and more for larger k-s. 

The ability to calculate k = 4 effective averages allows us to study linear stability of the replica free energy 
( 2. 13b| ) at the stationary Q. Using the results of Ref. |34|] on ultrametric matrices we expressed the so called replicon 
eigenvalues in terms of Green functions. While a general proof of the fact that there are no negative eigenvalues 
in the case of continuous replica symmetry breaking, i. e., when x(q) has a continuously increasing segment, is 
not available, in the high temperature limit p^j2^ | we confirmed the absence of linear instability against replicons 
whenever we encountered such a stationary state. For any temperatures we recovered analytically the zero eigenvalues, 
corresponding to Goldsto ne m odes, as well as the lowest order Ward-Takahashi identities predicted by algebra p5fl . 

The generalization of (4.14) to non-factorizable functions is straightforward. For example, such functions would 
simply replace the products of Ak-s in (4.15) and (4.17). 



V. LOW TEMPERATURE RESULTS 

In Ref. [ pij we have shown that in the high temperature limit, i. e., for a, f3 — > oo with 7 = a/3 2 finite, the problem 
simplifies to the extent that if x(q) has a continuously increasing segment, this can be given in a closed analytic form. 
In that limit the problem becomes equivalent to the spherical, mult i-p- spin interaction spin glass [B6[, wher e a similar 



observation has been made. Four different phases have been found |24j for the error counting potential (2.7): for small 
7 replica symmetry holds, and for |k| < 2 and large 7 there is a Parisi phase with a single continuously increasing 
segment of x(q) between the trivial plateaus x = and x = 1. When \k\ > 2 there is also a narrow one-step replica 
symmetry broken regime, and for large enough 7-s equilibrium is characterized by an x(q) that is a concatenation of 
a nontrivial plateau and a continuously increasing segment. While for T > there cannot be error free storage, it is 
plausible to conceive the replica symmetric regime as the continuation of the T = phase of perfect storage and the 
symmetry broken phases as the analog of the regime at T = beyond capacity. 

In the case of a V(y) potential that vanishes for y > n the limit of capacity is given for k > at T = by Q 

oi c (k)=(J Dt{K-tf^j , (5.1) 
as it follows from the replica symmetric solution when q — * 1. This formula also gives the limit of the de Almeida- 



Thouless (AT) local stability |28[| in the case of the potential (2.7). For n — one has a c = 2, and, for increasing n, 
a c (n) understandably decreases. As it has been already mentioned, beyond capacity none of the finite-step replica 
symmetry breaking schemes gives a locally sta ble equilibrium state fill . Thus in this regime the order parameter 



function is no longer of the step-like form of (3.8), rather it has a continuously increasing part. This makes it necessary 
to numerically solve the extremization problem ( |3.2l|) . 



The ground state (T=0) has its special scaling properties. The PDE (3.13a) stays meaningful if for q < 1 the 
function (3x(q) does not diverge, implying that x{q) goes to zero. Given the meaning of x(q) as the probability of 
a q being in the interval (0, q) ||, we can say that at T = the overlap q is 1 with probability 1 for all a > a c . 
Nevertheless, a physically meaningful order parameter is obtained after scaling by (3. Firstly we introduce for T > 
the parameters of the classic Parisi shape as 

x(q) = if < q < q {0) 

x(q) > and x (0 ) < x(q) < x {1) if q (0 ) < q < q(x) (5.2) 

x(q) = 1 if <7 (1) < q < 1. 

The scaled quantities 
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q{t) =9(1) -(<7(i)- 9(0)) (1-(1 + 9(1)) t + q^t 2 ), 0<t<l, (5.3a) 

C(t)-/3.x(g(i))g(i), (5.3b) 

77 = /3 (1 - , (5.3c) 

A(t)=0D(q(t))= [ £(i)dt + ri, (5.3d) 



are expectedly regular even in the T — * limit, when qn\ — > 1. Note that there is an arbitrariness in the parametriza- 
tion by t, the m ain features being that q(0) = q(o), 9(0) = 9(o)> 9(1) — 9(1) > an d 9(1) — 0. With this parametrization 
the PDE |uf ) becomes 



j/) = -\ q(t) d 2 J(t, y) + \ at) (d y f(t, y)f , (5.4a) 



/(l,y) = -~ln / Dze-^^+V 1 -^)). 

P J 



(5.4b) 



For T = the initial condition becomes 



(y - y ) 2 

y \ yc " ' 2i] 



1 if y < k — \/2rj 



f(t = l,y)\ T=0 = min ( V(y) + ™ J' ) = { \in-^<y<K (5.5) 

if y > k, 



where (2.7) was substituted to get the second equality. By Gaussian integration hence the replica symmetric solution 
can be obtained p| p^j25[ |. Note that f(t — l,y) is a continuous function even though V(y) is the step function. 
Although we used the same symbols for functions of q and t, misunderstanding are avoided by our marking which 
argument we mean. 



The PDE for P(q, y) follows from the definition ( |4.6|) and from the evolution equation of the Green function (4.2t). 
The latter can be properly rescaled according to ( |5.3|) , yielding in principle the solution P(t,y). The pro bab ility 
density of local stabilities is obtained by evolving P(t = 1, y) = P(q = q^ , y) to P(q = 1, y). At T — with fl2.7p we 
have 

P(1,A) ifA<K-V2^/ 
P (A) = {0 if K -72^<A< K (5.6) 



P(l,A) + 6(A- K ) f«_^dyP(l,y) if A > 



where P(l, A) = P(t = 1, A) is understood. For arbitrarily small T > the gap in the support of p(A) immediately 
vanishes. For details we again refer to [B7fl. 



The numerical extremization was done with complementing the free energy functional (3.22) by constraints. The 
PDE (jO| ) was added giving rise to a Lagrange multiplier field. This field can be shown to be just P{t, y), see Refs. 
p4] , p7j]3*l*r . Further technical requirements are x(q^) > 0, £(<Z(i)) < 1, and ±{q) > for mm < q < q(i), which were 
taken into account by soft constraints. A few results for the Gardner-Derrida potential (pj), with k — 0, a = 3, are 
displayed on Figs. 2 and 3 for various low temperatures. Note that the point (k = 0, a = 3) lies beyond capacity, 
while beyond about /3 _1 = T = 0.2 the RS solution satisfies the AT stability condition |27| . 

On Fig. 2 the scaled order parameter function (3x(q) is s how n. For small temperatures the replica symmetric 



solution is AT unstable, and we indeed obtain the Parisi form (5^2) for x(q). At g( ) near 0.75 the functions x{q) jump 



to zero and rem ains there as q further decreases. The upper plateau with x(q) = 1 starts at q^iy The consistency of 



the scaling (5.3) is confirmed by our finding at T = finite values for flx(q) if q < 1 and for 7/ w 0.26. Interestingly, 
the curved segment of (3x{q) does not change much with increasing temperature, the main effect being the decrease 
of 9(1). 
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100 - 




FIG. 2. Scaled order parameter function x(q) for k = 0, a — 3 at T = (solid), T — 0.01 (dashed), and T = 0.1 (dotted). 



T he local stability distribution p(A) of ( [2.1 6| ) is d isplayed on Fig. 3. It was numerically obtained for T = from 
( |5.6|) and for T > from the original formula ( 4.12 ), with the same parameter values as in Fig. 3. For the sake of 
better visibility of the other details, the very high peaks near A = n = for T = 0.1 and T = 0.01 as well as the 
corresponding <5-peak for T — have been omitted from this plot. For |A| > 3 the curves approach zero very quickly. 
A true gap with p(A) — to the left of A = develops only if T — 0, while for the positive T-values p(A) is positive 
albeit small there. While the gap at T — is present in R = 0,1,2 step replica symmetry breaking schemes, see 
Refs. [^5|,^2],[l9| respectively, in all these cases a jump appears near the lower edge. This can be associated with the 
thermodynamic instability of those saddle points |37|) . Our present solution gives linearly vanishing p(A) at the lower 
edge, signaling the absence of replicon instability [B7J . 



0,8 



0,6 



< 



0,4 



0,2 






A 



FIG. 3. Density of local stabilities p(A) from theory for n = 0, a = 3 at T = (solid), T = 0.01 (dashed), and T = 0.1 
(dotted). 
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In order to compare theory with practice we performed a medium scale simulation. The standard Hebbian algorithm 
was modified by Wendemuth H| to provide convergence for negative stabilities. Since we chose n = l, the final steps 
during stabilization of a pattern went on wi th A > 0, so in our case the modification was not essential. The algorithm 
goes as follows. Firstly random patterns (2.2) are generated uniformly from an interval centered about zero and 
normalized as J2k=i(^k) 2 ~ and all outputs £ M are taken uniformly 1. This does not restrict generality since S% 
have random signs. The initial coupling vector is set to be proportional to 



M 



J*(0)cx£ 5 fc' 



(5.7) 



such that its Eucledian norm is N. In the i-th step of the algorithm one calculates the local stabilities 

J(t) ■ 



A"(f) 



|J(*)I 



(5.8) 



and selects the least unstable pattern, i. e., the one with the largest A M (i) < k. Let us denote its index by /iq, whose 
argument t we omit. Next one augments the couplings as follows. If A A1 °<*) (t) > then 



J(i + 1) = J(t) + XS"°, 
and if A"°W(t) < we have following Wendemuth 

N/\J(t)\ -A*>(t) 



J(t+l) = J(t)+A S w +J(i) 



|J(t)|-AM0( t ) 



(5.9) 



(5.10) 



Here A is the gain parameter, the overall scale of increments of J. In Ref. the gain parameter was A = iV~ 3 / 2 , 
after experimentation we chose A = TV -1 . Such an increase in the gain parameter did not endanger, rather sped up 
convergence. At time t + 1 we again look for the least unstable pattern, and so on. The algorithm goes on until it gets 
stuck with one pattern that we are not able to stabilize in a reasonable time. The intuitive idea behind the algorithm 
is that since an unstable pattern counts as error irrespective of the distance of the stability parameter A M from k, 
one assumes that it is the easiest to stabilize the pattern with A^ closest to K. So one may hope that thus the largest 
possible number of patterns can be stabilized. 

The program ran on 28 PC-s in parallel, each having an AMD K6 processor of 333 MHz, during about one day. 
Fig. 4 shows both t he th eoretical curve and the results of the simulation for a = K = 1, a point known to fall beyond 
the capacity curve (5.1). The full line is the result of numerical extremization of the free energy functional in the 
way Fig. 3 was obtained. The Dirac delta peak of the theoretical probability density at k is not illustrated. The 
discontinuous lines represent the histograms for the local stabilities from simulation for two sizes, M = N = 500 and 
1000, after normalization. 
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FIG. 4. Density of local stabilities p(A) at a = k = 1, axes as in Fig. 3. The theoretical prediction is given by the full line. 
The two empirical densities are normalized histograms, taken with M = N — 500 and 1000. 



The closeness of the two histograms demonstrates that size effects were probably not the cause for the systematic 
difference between theory and the numerical experiment. A possible ground for the discrepancy is that the algorithm 
may have been halted prematurely. However, the time necessary for the stabilization of patterns was allowed to grow 
for each subsequent pattern, and the algorithm was ended only when stabilization did not occur even within the 
multiple of such an extrapolated time. Another possible reason for the deviation may be that the algorithm got stuck 
in a "local optimum" without being able to globally maximize the number of stable patterns. In this regard several 
modified initial conditions were tested but the number of stabilized patterns did not grow in the end. A source of 
concern can be that the built in random number generator of the C compiler was used; we did not test other routines 
for this purpose. As to the algorithm, despite its intuitive appeal, there is no proof that it would be able to globally 
minimize the Hamiltonian (2.6) with error measure ( 27fl) . Furthermore, it is likely that with the present parameters 
the learning task is an NP-complete problem [p^ ^8||3£ |7~thus explaining imperfect convergence. 

We emphasize that the results represent a significant improvement with respect to the earlier simulation in Ref . |39| . 
The error per example e found in [|9| is about 0.21, while the present data correspond to 0.15 and theory predicts 
0.1358. Thus the deviation between simulation and theory has been decreased by 80%. That means that we stabilized 
more patterns than fl39| |, although, given the difference between the theoretical and simulation results, we still could 
not find the global optimum. Furthermore, an important feature of the density p(A) is that it should continuously 
vanish, with a nonzero slope, at the lower edge of the gap. This property is reproduced by the simulation data, in a 
sharper fashion with the larger M = N = 1000 size, while the value of the edge remains slightly overestimated. 
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